Stein Variational Policy Gradient
نویسندگان
چکیده
Policy gradient methods have been successfully applied to many complex reinforcement learning problems. However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration. In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian inference problem. We then propose a novel Stein variational policy gradient method (SVPG) which combines existing policy gradient methods and a repulsive functional to generate a set of diverse but wellbehaved policies. SVPG is robust to initialization and can easily be implemented in a parallel manner. On continuous control problems, we find that implementing SVPG on top of REINFORCE and advantage actor-critic algorithms improves both average return and data efficiency.
منابع مشابه
Variational Inference for Policy Gradient
Inspired by the seminal work on Stein Variational Inference [2] and Stein Variational Policy Gradient [3], we derived a method to generate samples from the posterior variational parameter distribution by explicitly minimizing the KL divergence to match the target distribution in an amortize fashion. Consequently, we applied this varational inference technique into vanilla policy gradient, TRPO ...
متن کاملStein Variational Gradient Descent as Gradient Flow
Stein variational gradient descent (SVGD) is a deterministic sampling algorithm that iteratively transports a set of particles to approximate given distributions, based on a gradient-based update that guarantees to optimally decrease the KL divergence within a function space. This paper develops the first theoretical analysis on SVGD. We establish that the empirical measures of the SVGD samples...
متن کاملVAE Learning via Stein Variational Gradient Descent
A new method for learning variational autoencoders (VAEs) is developed, based on Stein variational gradient descent. A key advantage of this approach is that one need not make parametric assumptions about the form of the encoder distribution. Performance is further enhanced by integrating the proposed encoder with importance sampling. Excellent performance is demonstrated across multiple unsupe...
متن کاملStein Variational Gradient Descent: Theory and Applications
Although optimization can be done very efficiently using gradient-based optimization these days, Bayesian inference or probabilistic sampling has been considered to be much more difficult. Stein variational gradient descent (SVGD) is a new particle-based inference method derived using a functional gradient descent for minimizing KL divergence without explicit parametric assumptions. SVGD can be...
متن کاملStein Variational Autoencoder
A new method for learning variational autoencoders (VAEs) is developed, based on Stein variational gradient descent. A key advantage of this approach is that one need not make parametric assumptions about the form of the encoder distribution. Performance is further enhanced by integrating the proposed encoder with importance sampling. Excellent performance is demonstrated across multiple unsupe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1704.02399 شماره
صفحات -
تاریخ انتشار 2017